Skip to content

default to PushCollector for TrainJob experiments#2650

Open
gerardcode wants to merge 1 commit intokubeflow:masterfrom
gerardcode:gerard-fix
Open

default to PushCollector for TrainJob experiments#2650
gerardcode wants to merge 1 commit intokubeflow:masterfrom
gerardcode:gerard-fix

Conversation

@gerardcode
Copy link
Copy Markdown

Title: feat: default to PushCollector for TrainJob experiments

Description:

  • When a TrainJob is used as the trial spec and no metricsCollectorSpec is explicitly set, the experiment now defaults to Push collector instead of StdOut
  • Push-based metrics avoids sidecar injection complexity for distributed TrainJob pods and aligns with the Kubeflow Training SDK's report_metrics() approach
  • Existing behavior for all other job kinds (Job, PyTorchJob, etc.) is unchanged
  • Explicit metricsCollectorSpec settings are always respected (no override)
  • Adds experiment_defaults_test.go with 4 test cases covering the new behavior

@github-actions
Copy link
Copy Markdown

🎉 Welcome to the Kubeflow Katib repo! 🎉

Thanks for opening your first PR! We're excited to have you onboard 🚀

Next steps:

Feel free to ask questions in the comments. Thanks again for contributing! 🙏

@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant